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Abstract 

We present a novel topological classification of RNA secondary structures with 
pseudoknots. It is based on the topological genus of the circular diagram associated 
to the RNA base-pair structure. The genus is a positive integer number, whose value 
quantifies the topological complexity of the folded RNA structure. In such a repre- 
sentation, planar diagrams correspond to pure RNA secondary structures and have 
zero genus, whereas non planar diagrams correspond to pseudoknotted structures and 
have higher genus. We analyze real RNA structures from the databases wwPDB and 
Pseudobase, and classify them according to their topological genus. We compare the 
results of our statistical survey with existing theoretical and numerical models. We 
also discuss possible applications of this classification and show how it can be used for 
identifying new RNA structural motifs. 
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Introduction 



In their biologically active form, RNA molecules are folded in fairly well defined three dimensional 
structures [Q. These structures are strongly constrained by the pairing of conjugate bases along the 
sequence, but depend also on the ionic strength of the solution |2j. It has proved very useful to 
describe the pairing of RNA in terms of secondary structures and pseudoknots These structural 
elements can be viewed as motifs which appear repeatedly in the folds. The main structural motifs 
of secondary structures are helical duplexes, single stranded regions, hairpin stems, hairpin loops, 
bulges and internal loops, junctions and multiloops (see table It is convenient at this stage to 
introduce some standard graphical representations of RNA structures. In the linear representation, 
one writes the base sequence on an oriented straight line, starting from the 5' to the 3' end. By 
replacing the straight line by a closed circle one obtains the circular representation. The pairing of 
two bases is represented by a dotted line, or colored line, joining the two bases in the upper side of the 
straight 5'-to-3' line. In the case of a circular representation, pairings are drawn inside the circle. This 
representation associates a unique diagram to any set of base pairings of RNA. In the circular and linear 
representation, a diagram represents a secondary structure if it involves only pairings which do not 
cross 4 J. In tabled (top row), we show a secondary structure, together with its two representations 
(linear and circular in the fourth and fifth column, respectively). Similarly, a diagram contains a 
pseudoknot if it contains pairings which do cross ( see, e.g., the bottom row in table |2J). 

There are quite a few methods to predict secondary structures. Energy-based methods have proven 
to be the most reliable (as, e.g., |161 117j ). They assign some energy to the base pairings and some 
entropy to the loops and bulges. In addition, they take into account stacking energies, and assign 
precise weights to specific patterns (tetraloops, multiloops, etc.) ^Sj. The lowest free energy folds are 
obtained either by dynamic programming algorithms |19j . or by computing the partition function of 
the RNA molecule (213]. The main drawback of these energy-based methods is that they deal solely 
with secondary structures and cannot take into account pseudoknots in a systematic way. 

There are several computer programs that attempt to predict RNA-folding with pseudoknots, 
but the problem is still mostly unsolved (see, e.g. HH HH H31 [2H HH1 HZl I2H1 ; the list is 
not exhaustive) . There exists however a novel approach: in order to include the pseudoknots, the 
RNA folding problem has been formulated in terms of a sophisticated mathematical theory, namely a 
quantum matrix field theory [201 • These types of field theories were first introduced in particle physics, 
more precisely in Quantum Chromodynamics, in order to model the theory of strong interactions |31j . 
Since then, these field theories have been used in many mathematical problems, such as combinatorics, 
number theory, etc. (for a recent review see |S2])- They involve a parameter N, the linear dimension of 
the N x N matrices, which can be used as an expansion parameter for the theory (large N expansion) 
|31j . In the RNA folding problem, the matrix field theory can be expanded diagrammatically in 
various parameters. The simplest development is in terms of the number of pairings and can easily 
be represented in terms of diagrams. These diagrams, which are the usual Feynman diagrams of 
quantum field theory, can be viewed as the set of all the possible pairings of the RNA, with the correct 
corresponding Boltzmann weights |30[ I33j . Another possible expansion is in powers of 1/N. As was 
shown in a previous paper [301 this expansion relies on a topological number called the genus 
which characterizes the pairing. As we shall see, the genus of a diagram is defined by its embedding 
on a two-dimensional surface. It is the minimal number of handles that the surface should have so 
that the diagram can be drawn on the surface without crossing. 

Secondary structures correspond to zero genus, that is planar structures: They can be drawn 
on a sphere without crossing. The simplest pseudoknots, such as the "H-pseudoknot" (see table 
EJ) or the kissing hairpin, correspond to genus 1: they can be drawn on a torus without crossing. 
This classification of RNA structures allows us to completely grasp the topological complexity of a 
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Table 1: Examples of basic RNA secondary structure motifs. From top to bottom: a single strand 
(PDB 283D 0), a helical duplex (PDB 405d 0), a hairpin stem and loop (PDB le4p 0), a bulge 
(PDB lr7w 8 ), a multiloop (PDB lkh6 9 ). From left to right: spacefill view, three-dimensional 
structure, secondary structure motif. The pictures are made with MolPov |lf)j . Jmol and PovRay 

na. 
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Table 2: Top row: example of a secondary structure motif (a helix, PDB la51 13 ). Bottom row: 
an example of a common RNA H-pseudoknot (PDB la60 ^1]). Prom left to right: spacefill view, 
three-dimensional structure, secondary structure (and base-pairings from RNAView |15j^. linear rep- 
resentation and circular representation. In red are emphasized the non-planar pairings (crossing arcs). 

pseudoknot with a single integer number, the genus. It can be viewed as a kind of "quantum" number. 
It is reminiscent of the superfold families, such as CATH or SCOP [SHj, which have proven so useful in 
protein structure classification. In the literature other possible classifications of RNA structures with 
pseudoknots have been proposed, such as the ones in, e.g. , |3S]- However, the one we propose in this 
paper is the only one that is purely topological, i.e. independent of any three-dimensional embedding 
and which is based only on the classical topological expansion of closed bounded surfaces. This is also 
the reason why this expansion can be derived mathematically with standard tools of combinatorial 
topology. We believe that such a mathematical framework can be exploited far beyond the simple 
classification of RNA pseudoknots, and could be applied also for RNA- folding predictions 34 j. In this 
work however we restrict only to the problem of classifying known RNA-structures. 

In the following, we shall define more precisely the genus for a given diagram, and show how it 
can be simply calculated. We then present an analysis of the genii of two main databases which 
contain RNA structures, namely PSEUDOBASE [37] and the wwPDB (the Worldwide Protein Data 
Bank which contains some RNAs). The RNA structures in the latter are also listed in the RNABase 
database [3Hj, that we also used as a reference database. We find that RNAs of sizes up to about 250 
have a genus smaller than 2, whereas long RNAs, such as ribozomal RNA may have a genus up to 18. 

Materials and Methods 

The genus. The topological classification of RNA secondary structures with pseudoknots that we 
propose is based on the concept of topological genus. We first review the definition of genus of a given 
diagram. Consider a diagram representing a pairing in the linear representation. The matrix field 
theory representation of the problem suggests representing a pairing not by a single dotted line, but 
rather by a double line (which should never be twisted) |30| 13 lj . Therefore, a unique diagram in the 
double line representation corresponds to each dotted-line diagram. Some examples are shown in figO 
Each double line diagram is characterized by its number of double lines (i.e. the number of pairings 
of the diagram) which we denote by P, and by its number of loops denoted L, which is the total number 
of closed loops made with the (single) lines of the diagram. For instance, in figQ (bottom) and in 
fig'E] the diagram has P = 3 double lines and L = 1 loop. The genus of the diagram is the integer 
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Figure 1: A schematic view of the double line representation (right) of a generic linear representation 
of pairings (left). The example a) (top) represents a couple of stacked base-pairs, and b) (bottom) 
represents an H-pseudoknot embedded in an hairpin. 

defined by 

P-L 




Figure 2: This diagram represents a pseudoknot with genus g = 1 since it has P = 3 double lines and 
L = 1 loops. 

It is related to the Euler characteristics of the diagram, and is a topological invariant of the 
diagram. Its geometrical interpretation is quite simple. Consider a sphere with g handles: a sphere 
with handles is a sphere, a sphere with one handle is topologically equivalent to a torus, a sphere 
with 2 handles is topologically equivalent to a double-torus, etc. (see figJ3J). The genus g of a diagram 
is the minimum number of handles a sphere must have in order to be able to draw the diagram on 
it without any crossing. The precise way to do so, is unambiguously defined only when the diagram 
does not have open dangling lines, such as the 5' or 3' ends. Therefore it is important to connect the 
ends, as is done in the circular representation. However, it is more convenient to close the two ends 
below the backbone-line, which results in drawing the pairing arcs all at the exterior of the backbone- 
circle. In that way it is simple to see how the embedding of a pseudoknotted RNA structure on a 
high- genus surface works. Mathematically speaking, the circle of the RNA-backbone (when the 5' are 
3' connected) becomes the boundary of a hole or puncture on the surface, and the arcs corresponding 
to the RNA base-pairs are drawn on the surface without that hole. In figlU we show explicit examples 
of diagrams having different genus. As can be seen, a diagram with genus is planar, in that it can 
be drawn on the sphere without crossing, and corresponds to a secondary structure. More generally, 
it was shown in [SHI that the secondary structure diagrams are all the planar diagrams with g = 0. 
Likewise, in fig|I]one sees also how diagrams with non-zero genus g ^ can be drawn without any 
crossing on a surface with g handles. Clearly, different diagrams can have the same genus. Thus, in 
order to further simplify the classification, we first note that adding a line of pairing parallel to an 
existing one does not change the genus of the diagram, since it increases by one the number of pairing 
lines, and increases by one the number of loops of the diagram. Therefore, all diagrams with parallel 
pairings are equivalent topologically. We will thus use a reduced representation of the diagrams, where 
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Figure 3: First few terms of the topological expansion of closed oriented surfaces: the term g = is a 
sphere, g = 1 is a torus, g = 2 is a double torus and so forth. 




Figure 4: Any RNA circular diagram can be drawn on a closed surface with a suitable number of 
"handles" (the genus). For the sake of simplicity, in this figure all helices and set of pairings on the 
surfaces are schematically identified only by their color. Note that the circle of the RNA-backbone (in 
green) topologically corresponds to a hole (or puncture) on the surface. 
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each pairing line can be replaced by any number of parallel pairings as in figC3 With this convention, 



II - I 




Figure 5: The genus of a diagram does not change by identifying a stack of paired bases with a single 
effective base-pair. 

it has been shown in [SHj that there are exactly 8 topologies of pseudoknots of genus 1, see fig El Those 
topologies can be uniquely identified also as a) ABAB, b) ABACBC, c) ABCABC, d) ABCADBCD, 
where each letter A,B, etc. indicates a specific helix (or set of helices) along the RNA-backbone from 
the 5' end to the 3' end. Note that one recognizes the standard H-pseudoknot (ABAB) and the kissing 
hairpin (ABACBC) (diagrams a) and b) on the left of figlHl respectively). Among the 8 pseudoknots of 
genus 1, four are quite common in the databases (the rows a) and b) of figEJ), two are very rare (the row 
c) figEJ), and the remaining two have not been reported as of yet. We will discuss these pseudoknots in 
more details in the next section. Let us insist again that the genus captures the topological complexity 
of the pseudoknots. It is not simply related to the number of crossings, or of pairings. It depends on 
the intrinsic complexity of the pseudoknot. This complexity itself depends on what kind of pairings are 
considered. This is of course conventional. Before discussing the statistics of the genus of pseudoknots 
from the databases, let us address this question. As discussed in [30], there are many possible non- 
canonical bonds between base-pairs. We emphasize that our classification of RNA structures according 
to their genus is well defined and possible even when including non-canonical bonds, or more general 
definitions of RNA-binding interactions (as far as such interactions are binary) . The larger the number 
of pairings, the higher the genus of the structure might be. However, the weaker bonds, such as the 
Hoogsteen bonds, or even the wobble pairs, do not form the structure, they merely stabilize a structure 
already formed by canonical pairings. Therefore, in the following, we shall consider only Watson-Crick 
pairs between conjugate bases and G-U wobble pairs. 

Irreducibility and nesting. In many cases, the genus of a diagram is an additive quantity. For 
instance, if we consider a succession of two H-pseudoknots (see figO left), each one has genus 1, and 
the total genus of the diagram is 2. In order to characterize the intrinsic complexity of a pseudoknot, 
it is thus desirable to define the notion of irreducibility. A diagram is said to be irreducible if it can 
not be broken into two disconnected pieces by cutting a single line. The diagram on the left of fig|7|is 
reducible, whereas the one on the right of figQis irreducible. Any diagram can thus be decomposed 
in a unique way into irreducible parts. It is obvious that the genus of a non-irreducible diagram is the 
sum of the genii of its irreducible components. 

Similarly, if one considers the diagram of fig03(left), its genus is equal to 2. It is composed of an H 
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Figure 6: These are the only 8 types of irreducible pseudoknots with genus g = 1. 
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Figure 7: Example of a reducible pseudoknot (left) and an irreducible one (right). The reducible 
pseudoknot can be split in two disconnected parts, as shown, by cutting the backbone only once. The 
total genus is the sum of the genus of the two components (in this example the total genus is 2). 
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pseudoknot, embedded inside another H pseudoknot. A diagram is said to be embedded or nested in 
another, if it can be removed by cutting two lines while the rest of the diagram stays connected in a 
single component. The diagram on the left of fig|H]is nested, whereas the one on the right is not. It is 
clear that the genus of a nested diagram is the sum of the genii of its nested components. As a result, 
to any non-nested diagram of genus g there corresponds a nested diagram of same genus, obtained 
by adding a pairing line between the first base and the last base of the diagram. For instance, the 8 
diagrams of genus 1 in figEl can be decomposed in 4 non-nested diagrams (left column) and 4 nested 
diagrams (right column). Therefore, there are only 4 irreducible non-nested diagrams (a,b,c,d) of genus 
1. As we shall see in the next section, pseudoknots (a) and (b) are quite common, pseudoknot (c) has 
been seen but is rare, and pseudoknot (d) has not yet been seen. In the following, a pseudoknot which 
is irreducible and non nested is said to be primitive. Clearly, all RNA structures can be constructed 
from primitive pseudoknots. The primitive diagram for secondary structures is obviously a single 
pairing. 




Figure 8: An example of nested diagram (left) and not nested (right). A nested diagram can be 
disconnected in two components by cutting the backbone in two points. 



Results and Discussion 

Analysis of databases. There are several databases containing RNA structures. We have analyzed two 
of them, namely Pseudobase [37] and the wwPDB 1 (modulo the RN Abase database jSH])- 

Pseudobase 

Pseudobase is a database, containing 246 pseudoknots, at the time of writing this work. These 
pseudoknots have been deposited and validated by several research groups. They are subsegments 
of larger RNA sequences, and are displayed in bracket form using several symbols (see hgEJ)- As an 
example, we show below one of the pseudoknots from Pseudobase (accession number PKB210) 

CGCUGCACUGAUCUGUCCUUGGGUCAGGCGGGGGAAGGCAACUUCCCAGGGGGCAACCCCGAACCGCAGCAGCGAC 
((((((::(((:::[[[[[[[::))):((((((((((::::)))))):((((::::)))):::)))):)))))):: 

AUUCACAAGGAA 
: : : : :]]]]]]] 
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((( )))....((( ))). ((( [[[)))•••]]]•• 

Figure 9: The bracket notation is commonly used for representing RNA secondary structures with 
simple pseudoknots. One stem of the pseudoknot is represented by parenthesis and brackets for the 
other stem. A dot "." indicates a free base. 

A simple analysis shows that this is an H pseudoknot, of the type ABAB. Likewise, we analyzed all 
the 246 pseudoknots of Pseudobase and found that: 

• there are 238 H pseudoknots (or nested H pseudoknots) of the ABAB type with genus 1 

• there are 6 kissing hairpin pseudoknots (or nested) of the ABACBC type with genus 1 

• there is 1 pseudoknot of the type ABCABC (number PKB71) with genus 1 

• there is 1 pseudoknot of the type ABCDCADB (number PKB75) with genus 2 

Note that the pseudoknot PKB71, from the regulatory region of the alpha ribosomal protein operon 
(E.coli organism) is the unique example of the ABCABC pseudoknot in Pseudobase. Its structure is 

main]: 

UGUGCGUUUCCAUUUGAGUAUCCUGAAAACGGGCUUUUCAGCAUGGAACGUACAUAUUAAAUAGUAGGAGUGC 
(((((((:(((((::::::::[[[[::::[[[[:: ::««:)))))))))))):: ::::::::::::::::: 

AUAGUGGCCCGUAUAGCAGGCAUUAACAUUCCUGA 
I::::::]]]]:::::]]]]:::::::::::}}}} 

Its irreducible structure is given in figure El (third from the top, on the left). However, looking at 
sequence alignment, it is very likely that in fact at least more than 20 other RNA sequences in the 
EMBL database fj2 a contain pseudoknots of this kind (A. Mondragon, A. Torres-Larios and K.K. 
Swinger, Department of Biochemistry, Molecular Biology and Cell Biology, Northwestern University, 
Evanston, IL: private communication). 

The wwPDB databank 

The world wide Protein Data Bank (wwPDB) is a collection of databases comprising mostly crystal- 
lographic and NMR structures of proteins pQ. In addition, as of today, it contains approximately 850 
structures containing at least one RNA molecule. Among these structures, there are about 300 single 
RNA structures, 200 containing several RNA fragments, 30 RNA/DNA complexes, 250 RNA/protein 
complexes and 60 transfer RNA. 

Among these 850 structures, there are about 650 structures which have obviously genus (very 
short sequences, or single or double stranded RNA helices). The number of bases ranges from 22 
(2glw.pdb) which is an H pseudoknot, to 2999 (chain 3 of lsli.pdb) which has genus 15. 

We have analyzed the remaining 200 structures according to the following scheme: 
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removal of non RNA molecules and extraction of the molecule of interest 

search for all pairings using the program RNA view 

selection of relevant pairings (Watson-Crick and G-U wobble) 

computation of the genus of the corresponding diagram 

ur results can be summarized in the following way 

Transfer RNAs, which are among the smallest RNAs (length of 78), are made of a single primitive 
pseudoknot (irreducible and non-nested) of genus 1 (a kissing-hairpin) nested inside an arch (see 

figim 




Hairph 



Kissing 



Hairpin 



ure 10: A typical tRNA (PDB lew, [3E])- It has the genus 1 of a kissing hairpin pseudoknot. 

Larger RNAs, such as RNA ribosomal 50s subunits (length larger than 2000), have total genii 
less than 18. For an RNA with a non designed random sequence of length L and without steric 
constraints, the typical genus should be L/4 03], which in the present case would be around 
500. Even by including steric constraints .44., the genus would be around 2000 x 0.14 ~ 280. 
In addition, if we analyze these sequences in terms of primitive pseudoknots, we find that most 
of the structures are built from very simple primitive blocks, with genii 1 or 2, nested inside a 
more complex pseudoknot, of genus smaller than 8. In fig llll we show an RNA of genus 7 and of 
length 2825 (the B chain of lvou.pdb [32]) made of 3 H-pseudoknots, 3 kissing hairpins, nested 
inside a large kissing hairpin. In fig |121 we display an RNA of genus 9 and of length 2825 (the 
B chain of lvpO.pdb, of the 50s subunit of E.Coli |46j). which is made of 3 H-pseudoknots and 
2 kissing hairpins, nested in a primitive pseudoknot of genus 4. 

There is no hierarchical nesting of the pseudoknots: The general structure observed in all RNAs of 
the PDB is that of several low genus primitive pseudoknots in serie, nested inside a possibly higher 
genus "scaffold pseudoknot". We show in fig^2 one example of decomposition of a structure 
(lvou.pdb, which is a 30s subunit of E. Coli). 
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Figure 11: The B chain of PDB lvou is an RNA of genus 7 and of length 2825 bases. On the right, 
the outermost primitive arc structure is the pseudoknot type b) of the second column in fig03 which 
has genus 1. Such a primitive structure is decorated by 6 additional simple pseudoknots of type H 
and K (type a) and b) in the first column of figEl respectively). 




Figure 12: The B chain of PDB lvpO is an RNA of genus 9 and of length 2825 bases. The outermost 
primitive structure is similar to the one of fig llll with a more complex decoration on the right-hand 
part. There, a complex pseudoknot with genus 4 is included. Five simple H and K pseudoknots 
complete the full decoration. 
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• In fig ll3l (left), we plot the distribution of genii as a function of the length of the RNA. As 
mentioned before, the genii are much lower than what is expected for random sequences, and 
this is a manifestation of the specific design of RNA. 

• In fig |13l (right), we plot a histogram of the statistics of primitive pseudoknots in the PDB. We 
see that the genus of primitive pseudoknots is small, typically one or 2, and that the probability 
to observe large genii is very small. This reflects the fact that complex pseudoknots are built 
from many small primitive pseudoknots with low genii. 
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Figure 13: On the left: total genus as a function of the number of bases in the RNA molecule. The 
interpolating dashed line emphasizes an overall linear behavior. On the right: histogram distribution 
of the number n of primitive pseudoknots as a function of their genus g for all RNA molecules in the 
wwPDB database. 



We conclude by reporting in table El the sorted list of all the PDB files with non-zero genus, 
according to our classification. Note that our statistical analysis is biased by the inherent bias of the 
PDB: the PDB sometimes contains many structures of the same molecules, and thus those utilized for 
the statistical analysis are not independent. 

Conclusion 

We have shown that RNA structures can be characterized by a topological number, namely their 
genus. This genus is for secondary structures (planar structures), and non zero for pseudoknots. We 
have shown how the complexity of the RNA structure can be analyzed in terms of so-called "primitive 
pseudoknots". Any complex RNA structure can be uniquely decomposed as a sequence of primitive 
pseudoknots concatenated sequentially and nested. A survey of the existing RNA structures shows 
that even for large RNA (~ 3 kb), the genus remains small (smaller than 18), and natural RNA 
have a genus which is much smaller than that of paired structures obtained from random sequences. 
By capturing the intrinsic complexity of the structure, the genus provides a natural and powerful 
classification of RNA. Finally, a statistical study shows that complex RNA structures are built from 
low genii primitive pseudoknots (genii 0, 1 or 2), and that the most complex primitive pseudoknots 
have genus 13. In a forthcoming work, we will show how this concept of genus can be utilized to 
actually predict the folded structure of RNA molecules. 
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total genus 


PDB file accession number 


1 


lb23, IcOa, lc80, lchz, leiy, leuq, leuy, lf7u, lf7v, lfcw, Iffy, lfir, lg59, lgix-B, lgix-C, lgrz, 
lgtr, li9v, 1112, ljlu, ljgo-D, ljgp-D, ljgq-D, lkpd, lkpy, lkpz, 112x, 113d, lmjl, lmzp, ln77, 
loOb, loOc, lqf6, lqrs, lqrt, lqru, lqtq, lqu2, lqu3, lser, lszl, ltn2, ltra lttt Iu6b-B, lx8w, 
lyfg, lyg3, lymo, lzzn-B, 2a43, 2a64, 2csx, 2fk6, 2glw, 2tpk, 2tra, 437d, 4tna, 4tra, 6tna, 
lasy-R, lasy-S, lasz-R, lasz-S 


2 


lcxO, lddy, ldrz, lct4, lexd, lffz, lfgO, lfka, lpnx, lsj3, lsj4, lsjf, lu8d, lvbx, lvby, lvbz, 
lvcO, lvc5, lvc6, lvc7, lyOq, ly26, ly27, lyoq, 2a2c 


3 


Ii97, ln34, lslh, lvoz, Iyl4-A 


4 


libm, lfjg, lhnw, lhnx, lhnz, lhrO, li95, libk, libl, ln32, ln33, lq86 lvov, lvox, lxmo, 
lxmq-A, lxnr, lj5e 


5 


Ii94, li96, ln36, lvoq, lvos, lxnq, 2avy, 2aw7, 2aw7-A 


6 


lpns, lvoy-B 


7 


lc2w, lvou-B, Iyl3-A 


8 


lffk-0, lvow-B 


9 


lvpO-B, 2aw4-B 


10 


lnjm, lnjn, lnjo, lnjp, 2awb-B 


11 


IkOl, lp9x, lpnu, lpny 


12 


lj5a, ljzx, ljzy, ljzz, lnwx, lnwy-0, lsml-0, lxbp-0, ly69-0 


13 


lnkw-0, lond, 2d3o 


14 


ljj2, lk73, Ik8a-A, Ik9m-A, lkc8, lkdl, lkqs-0, lmlk, lm90, ln8r, lnji, lq7y, lq82, lqvf-0, 
ls72, lvq4-0, lvq5-0, lvq7-0, lvq8-0, lvq9-0, lvqk, lvql, lvql-0, lvqm, lvqn, lvqo-0, lvqp-0, 
lyhq-0, lyi2-0, lyij-0, lyit-0, lyj9-0, lyjn-0, lyjw-0, 2aar 


15 


lq81, lqvg, lsli-3, lvq6-0 


16 




17 


2aw4-B 


18 


2awb-B 



Table 3: List of the PDB files we considered in this paper, according to their total genus. The notation 
xxxx — y indicates the chain number y in the PDB file accession number xxxx. 
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